Skip to content

Conversation

@JonasBa
Copy link
Contributor

@JonasBa JonasBa commented Nov 1, 2025

Fixes #49888 by adding calls to normalizeEncoding.

Benchmarks between #f46152fdb3 and CL from this PR

                                                                                                                 confidence improvement accuracy (*)    (**)   (***)
fs/bench-writeFileSync.js n=1000 func='writeFile' useBuffer='false' length=1024 useFd='false' encoding='utf8'                    1.34 %       ±5.41%  ±7.22%  ±9.43%
fs/bench-writeFileSync.js n=1000 func='writeFile' useBuffer='false' length=1024 useFd='false' encoding='UTF8'                   -0.50 %       ±2.76%  ±3.67%  ±4.78%
fs/bench-writeFileSync.js n=1000 func='writeFile' useBuffer='false' length=1024 useFd='true' encoding='utf8'                    -0.39 %       ±3.46%  ±4.61%  ±6.01%
fs/bench-writeFileSync.js n=1000 func='writeFile' useBuffer='false' length=1024 useFd='true' encoding='UTF8'            ***     57.52 %       ±4.34%  ±5.79%  ±7.55%
fs/bench-writeFileSync.js n=1000 func='writeFile' useBuffer='false' length=102400 useFd='false' encoding='utf8'                 -0.39 %       ±3.91%  ±5.21%  ±6.78%
fs/bench-writeFileSync.js n=1000 func='writeFile' useBuffer='false' length=102400 useFd='false' encoding='UTF8'         ***     66.76 %       ±5.95%  ±7.98% ±10.52%
fs/bench-writeFileSync.js n=1000 func='writeFile' useBuffer='false' length=102400 useFd='true' encoding='utf8'                   0.93 %       ±1.54%  ±2.05%  ±2.67%
fs/bench-writeFileSync.js n=1000 func='writeFile' useBuffer='false' length=102400 useFd='true' encoding='UTF8'          ***    288.37 %       ±5.84%  ±7.79% ±10.20%
fs/bench-writeFileSync.js n=1000 func='writeFile' useBuffer='false' length=1048576 useFd='false' encoding='utf8'                 1.45 %       ±3.28%  ±4.38%  ±5.74%
fs/bench-writeFileSync.js n=1000 func='writeFile' useBuffer='false' length=1048576 useFd='false' encoding='UTF8'        ***     99.65 %       ±5.43%  ±7.29%  ±9.61%
fs/bench-writeFileSync.js n=1000 func='writeFile' useBuffer='false' length=1048576 useFd='true' encoding='utf8'                  2.30 %       ±5.30%  ±7.07%  ±9.24%
fs/bench-writeFileSync.js n=1000 func='writeFile' useBuffer='false' length=1048576 useFd='true' encoding='UTF8'         ***    246.19 %      ±17.91% ±24.14% ±32.04%

Be aware that when doing many comparisons the risk of a false-positive result increases.
In this case, there are 12 comparisons, you can thus expect the following amount of false-positive results:
  0.60 false positives, when considering a   5% risk acceptance (*, **, ***),
  0.12 false positives, when considering a   1% risk acceptance (**, ***),
  0.01 false positives, when considering a 0.1% risk acceptance (***)

I am open to suggestions here, but given that in the case of fast paths, the only encoding we care about is utf8, would it perhaps make more sense to have a isUTF8Encoding() which would return a bool?

Reviews and suggestions are much appreciated 🙏🏼

@JonasBa JonasBa requested review from CanadaHonk and anonrig November 1, 2025 22:19
@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/performance

@nodejs-github-bot nodejs-github-bot added fs Issues and PRs related to the fs subsystem / file system. needs-ci PRs that need a full CI run. labels Nov 1, 2025
@JonasBa JonasBa force-pushed the fix-utf8-encoding-case-sensitivity branch 3 times, most recently from 9143536 to 121725f Compare November 1, 2025 22:27
Copy link
Member

@ChALkeR ChALkeR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good see next comment

normalizeEncoding has already a short-cut for utf8 and utf-8 (which were directly checked here before), so this should not have any measurable negative effect for those common cases, but optimizes non-common ones

Copy link
Member

@ChALkeR ChALkeR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks default encoding to be buffer

normalizeEncoding normalizes undefined to 'utf8', but readFileSync does not set options.encoding to a default value at that point, unlike writeFileSync, and the expectation is to read Buffers by default, not strings

@JonasBa JonasBa force-pushed the fix-utf8-encoding-case-sensitivity branch from 121725f to 862df86 Compare November 2, 2025 14:11
@JonasBa JonasBa force-pushed the fix-utf8-encoding-case-sensitivity branch from 862df86 to 771bd39 Compare November 2, 2025 14:27
@RafaelGSS RafaelGSS added the performance Issues and PRs related to the performance of Node.js. label Nov 2, 2025
Copy link
Member

@ChALkeR ChALkeR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eeeh

this also demonstrates how flush option is broken in fast write path
it's does not appear to be the fault of this PR but an existing bug though

upd: #60552

@codecov
Copy link

codecov bot commented Nov 2, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.56%. Comparing base (f46152f) to head (771bd39).
⚠️ Report is 5 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #60539      +/-   ##
==========================================
- Coverage   88.58%   88.56%   -0.03%     
==========================================
  Files         704      704              
  Lines      207839   207844       +5     
  Branches    40051    40043       -8     
==========================================
- Hits       184117   184076      -41     
- Misses      15791    15809      +18     
- Partials     7931     7959      +28     
Files with missing lines Coverage Δ
lib/fs.js 98.18% <100.00%> (+<0.01%) ⬆️

... and 44 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ChALkeR
Copy link
Member

ChALkeR commented Nov 2, 2025

CI is going to fail unless #60552 is fixed first, which is though unrelated directly
Usage which only supplies { flush: true } in options, but relies on default encoding, will be negatively affected by this PR until that gets resolved

Basically, #60552 fix has to go in first. (Upd: #60553)
This PR is not at fault though and should not need further changes except for rebase

@JonasBa
Copy link
Contributor Author

JonasBa commented Nov 2, 2025

@ChALkeR, I appreciate the thorough review and explanation here, thank you so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fs Issues and PRs related to the fs subsystem / file system. needs-ci PRs that need a full CI run. performance Issues and PRs related to the performance of Node.js.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fs: utf8 fast paths don't accept all valid utf8 values

5 participants